Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array Based Dynamic Graph #135

Merged
merged 8 commits into from
Jan 29, 2015
Merged

Array Based Dynamic Graph #135

merged 8 commits into from
Jan 29, 2015

Conversation

plofgren
Copy link
Contributor

I've implemented a dynamic directed graph, using an ArrayBuffer of IntArrayList (from the fastutil library). If n nodes are used, O(n) objects are created, independent of the number of edges.

Comparison to the current class: Note that the current dynamic graph class SynchronizedDynamicGraph uses a ConcurrentHashMap to store nodes, which has more overhead than an ArrayBuffer when the nodes are mostly sequential. Also, each node stores an ArrayBuffer[Int], which I belive boxes the ints into objects, creating overhead relative to fastutil's IntArrayList. It also has non-trivial synchronization overhead; in particular when iterating over the neighbors of a node, the neighbors are first copied into a new array, then an iterator to the new array is returned. The current class is better when the node Ids are very non-sequential or when automatic synchronization is needed, but otherwise I believe the new class is more efficient. In a very informal comparison on a graph with several hundred million edges, the currrent class wouldn't load using 30GB of heap, while the new class fit the graph in 7.3 GB of RAM.

@pankajgupta Could you take a look at this, or point me to someone else?

Thanks!

@@ -61,4 +61,6 @@ object FastUtilUtils {

def newInt2IntOpenHashMap(): mutable.Map[Int, Int] =
new Int2IntOpenHashMap().asInstanceOf[jutil.Map[Int, Int]]

def intArrayListToSeq(list: IntArrayList): Seq[Int] = list map { _.toInt }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list.toIntArray() will also work, but this is fine too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be O(n) btw

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to use this at all (see my previous comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about O(n)! [edit: I'm now using the IndexedSeq wrapper you propose below, so it should be O(1)]

@pankajgupta
Copy link
Contributor

Consider adding benchmark in cassovary-benchmark subproject as well to demo the performance of this.

// outboundLists(id) contains the outbound neighbors of the given id,
// or null if the id is not in this graph.
// If we aren't storing outbound neighbors, outboundLists will always remain size 0.
private val outboundLists = new ArrayBuffer[IntArrayList]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is more efficient than ArrayBuffer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea.

@plofgren
Copy link
Contributor Author

I probably won't get to the benchmark for a few days, but I'll post once I have it, and I'll make a call then about multi-threading.

@pankajgupta
Copy link
Contributor

ok regarding benchmark

@plofgren
Copy link
Contributor Author

I needed a mutable undirected graph, so I just added code to support undirected graphs (i.e. stored direction Mutual).
@pankajgupta @szymonm Please comment on the new commit when you get a chance.

@plofgren
Copy link
Contributor Author

I actually didn't realize until today that "Mutual" meant undirected and was starting to write my own UndirectedGraph wrapper class when I noticed it. Should I add a comment to StoredGraphDir along the lines of
Mutual, // In a mutual graph, the outbound and inbound neighbors of each node are equal, and space is typically saved by only storing the outbound neighbors of each node
?

@pankajgupta
Copy link
Contributor

LGTM. Yes, Mutual has that intention, but it will be better for there to be a barebones UndirectedGraphWrapper class for readability and to avoid the surprise you ran into.
So let's close this PR and please feel free to take up that into a new PR.

pankajgupta added a commit that referenced this pull request Jan 29, 2015
A single threaded dynamic graph implementation that keeps nodes and each node's adjacencies in native arrays.
@pankajgupta pankajgupta merged commit 595d3e7 into twitter:master Jan 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants